Last Update: 7/13/2025

Qwen Audio Transcription API

The Qwen Audio Transcription API allows you to convert audio into text using OpenAI's SDK. This document provides an overview of the API endpoints, request parameters, and response structure.

Endpoint

POST https://platform.llmprovider.ai/v1/audio/transcriptions

Request Headers

Header	Value
Authorization	Bearer YOUR_API_KEY
Content-Type	multipart/form-data

Request Body

Parameter	Type	Description
file	file	The audio file object (not file name) to transcribe, in one of these formats: `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, or `webm`. file maxsize <= 20M
model	string	ID of the model to use (e.g., `paraformer-v2`).
prompt	string	(Optional) Text to guide the model's style or continue a previous audio segment.
response_format	string	(Optional) The format of the transcript output (`json`, `text`, `srt`, `verbose_json`, or `vtt`). Default is `json`.
temperature	number	(Optional) The sampling temperature, between 0 and 1. Default is 0.
language	string	(Optional) The language of the input audio (e.g., `en`, `es`, `fr`).
timestamp_granularities[]	array	(Optional) The timestamp granularities to populate for this transcription.

Response Body

The transcription object or a verbose transcription object.

The transcription object(JSON)

Parameter	Type	Description
text	string	The transcribed text.

{
  "text": "Hello, this is the transcribed text from the audio file."
}

The transcription object (Verbose JSON)

Parameter	Type	Description
task	string	The task performed by the model.
language	string	The language of the input audio.
duration	number	The duration of the audio in seconds.
segments	array	Segments of the transcribed text and their corresponding details.
text	string	The transcribed text.
words	array	Extracted words and their corresponding timestamps.

{
  "task": "transcribe",
  "language": "en",
  "duration": 2.95,
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 2.95,
      "text": "Hello, this is the transcribed text from the audio file.",
      "tokens": [
        50364,
        2425,
        11,
        359,
        307,
        1161,
        1123,
        422,
        264,
        1467,
        1780
      ],
      "temperature": 0.0,
      "avg_logprob": -0.458,
      "compression_ratio": 0.688,
      "no_speech_prob": 0.0192
    }
  ],
  "text": "Hello, this is the transcribed text from the audio file."
}

Example Request

Shell
nodejs
python

curl -X POST https://platform.llmprovider.ai/v1/audio/transcriptions \
    -H "Authorization: Bearer $YOUR_API_KEY" \
    -H "Content-Type: multipart/form-data" \
    -F file="@audio.mp3" \
    -F model="paraformer-v2"

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const formData = new FormData();
formData.append('file', fs.createReadStream('audio.mp3'));
formData.append('model', 'paraformer-v2');

axios.post('https://platform.llmprovider.ai/v1/audio/transcriptions', formData, {
    headers: {
        'Authorization': `Bearer ${YOUR_API_KEY}`,
        ...formData.getHeaders()
    }
})
    .then(response => {
        console.log(response.data);
    })
    .catch(error => {
        console.error('Error:', error);
    });

import requests

audio_file = open("audio.mp3", "rb")
files = {
        "file": audio_file
}
headers = {
        "Authorization": f"Bearer {YOUR_API_KEY}"
}

response = requests.post(
        "https://platform.llmprovider.ai/v1/audio/transcriptions",
        headers=headers,
        files=files,
        data={
                "model": "paraformer-v2"
        }
)

print(response.json())

For any questions or further assistance, please contact us at [email protected].

Endpoint​

Request Headers​

Request Body​

Response Body​

The transcription object(JSON)​

The transcription object (Verbose JSON)​

Example Request​